Overview

Dataset statistics

Number of variables17
Number of observations105840
Missing cells317520
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory136.0 B

Variable types

Numeric2
Categorical11
Text2
Unsupported2

Alerts

STATUS has constant value ""Constant
GEO is highly overall correlated with DGUIDHigh correlation
DGUID is highly overall correlated with GEOHigh correlation
Indicators is highly overall correlated with UOM and 4 other fieldsHigh correlation
UOM is highly overall correlated with Indicators and 1 other fieldsHigh correlation
UOM_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_FACTOR is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
DECIMALS is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
STATUS has 102816 (97.1%) missing valuesMissing
SYMBOL has 105840 (100.0%) missing valuesMissing
TERMINATED has 105840 (100.0%) missing valuesMissing
GEO is uniformly distributedUniform
DGUID is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform
SYMBOL is an unsupported type, check if it needs cleaning or further analysisUnsupported
TERMINATED is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-11-27 20:36:36.178251
Analysis finished2023-11-27 20:36:46.037693
Duration9.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-27T15:36:46.266301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-11-27T15:36:46.511586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-27T15:36:46.846327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-27T15:36:47.166618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-11-27T15:36:47.452433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:47.708859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-11-27T15:36:48.100231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-11-27T15:36:48.358540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:48.600543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-11-27T15:36:49.146246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:49.371056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

UOM_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
152
45360 
81
45360 
190
15120 

Length

Max length3
Median length3
Mean length2.5714286
Min length2

Characters and Unicode

Total characters272160
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row190
2nd row152
3rd row81
4th row152
5th row152

Common Values

ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Length

2023-11-27T15:36:49.633333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:49.839535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 272160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 272160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-11-27T15:36:50.100605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:50.331704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

SCALAR_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
75600 
3
15120 
6
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row6
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Length

2023-11-27T15:36:50.550895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:50.756630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

VECTOR
Text

Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-11-27T15:36:51.111288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1164240
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowv1273033811
2nd rowv1273033812
3rd rowv1273033813
4th rowv1273033814
5th rowv1273033815
ValueCountFrequency (%)
v1273033811 12
 
< 0.1%
v1273033912 12
 
< 0.1%
v1273034009 12
 
< 0.1%
v1273034008 12
 
< 0.1%
v1273034007 12
 
< 0.1%
v1273033915 12
 
< 0.1%
v1273033914 12
 
< 0.1%
v1273033913 12
 
< 0.1%
v1273033911 12
 
< 0.1%
v1273034698 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-11-27T15:36:51.709298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1058400
90.9%
Lowercase Letter 105840
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
v 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1058400
90.9%
Latin 105840
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Latin
ValueCountFrequency (%)
v 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1164240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%
Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-11-27T15:36:52.130225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8.5
Mean length7.8571429
Min length7

Characters and Unicode

Total characters831600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 12
 
< 0.1%
1.1.2.4 12
 
< 0.1%
1.1.3.3 12
 
< 0.1%
1.1.3.2 12
 
< 0.1%
1.1.3.1 12
 
< 0.1%
1.1.2.7 12
 
< 0.1%
1.1.2.6 12
 
< 0.1%
1.1.2.5 12
 
< 0.1%
1.1.2.3 12
 
< 0.1%
1.1.10.6 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-11-27T15:36:52.811544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 514080
61.8%
Other Punctuation 317520
38.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 153888
29.9%
2 63168
12.3%
3 63168
12.3%
4 63168
12.3%
5 55608
 
10.8%
6 34440
 
6.7%
7 34440
 
6.7%
8 19320
 
3.8%
0 13440
 
2.6%
9 13440
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 317520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 831600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 831600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-27T15:36:53.107292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-11-27T15:36:53.396243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

STATUS
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing102816
Missing (%)97.1%
Memory size827.0 KiB
x
3024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3024
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowx
2nd rowx
3rd rowx
4th rowx
5th rowx

Common Values

ValueCountFrequency (%)
x 3024
 
2.9%
(Missing) 102816
97.1%

Length

2023-11-27T15:36:53.682805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:53.889287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 3024
100.0%

Most occurring characters

ValueCountFrequency (%)
x 3024
100.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3024
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 3024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 3024
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
x 3024
100.0%

SYMBOL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

TERMINATED
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

DECIMALS
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
90720 
2
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Length

2023-11-27T15:36:54.079334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:36:54.256385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Interactions

2023-11-27T15:36:43.866645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:36:43.386385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:36:44.130769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:36:43.619814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-27T15:36:54.410913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDDECIMALS
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.0810.1050.1050.044
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
UOM_ID0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
SCALAR_ID0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
DECIMALS0.0000.0440.0000.0000.0000.0001.0000.4710.4710.2580.2581.000

Missing values

2023-11-27T15:36:44.516534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-27T15:36:45.206164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-27T15:36:45.816375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
02010Canada2016A000011124Total non-profit institutionsMale employeesNumber of jobsJobs190units0v12730338111.1.1.1642584.00NaNNaNNaN0
12010Canada2016A000011124Total non-profit institutionsMale employeesHours workedHours152thousands3v12730338121.1.1.21048516.00NaNNaNNaN0
22010Canada2016A000011124Total non-profit institutionsMale employeesWages and salariesDollars81millions6v12730338131.1.1.330805.00NaNNaNNaN0
32010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual hours workedHours152units0v12730338141.1.1.41632.00NaNNaNNaN0
42010Canada2016A000011124Total non-profit institutionsMale employeesAverage weekly hours workedHours152units0v12730338151.1.1.531.00NaNNaNNaN0
52010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual wages and salariesDollars81units0v12730338161.1.1.647940.00NaNNaNNaN0
62010Canada2016A000011124Total non-profit institutionsMale employeesAverage hourly wageDollars81units0v12730338171.1.1.729.38NaNNaNNaN2
72010Canada2016A000011124Total non-profit institutionsFemale employeesNumber of jobsJobs190units0v12730339091.1.2.11500394.00NaNNaNNaN0
82010Canada2016A000011124Total non-profit institutionsFemale employeesHours workedHours152thousands3v12730339101.1.2.22331018.00NaNNaNNaN0
92010Canada2016A000011124Total non-profit institutionsFemale employeesWages and salariesDollars81millions6v12730339111.1.2.360943.00NaNNaNNaN0
REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
1058302021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage weekly hours workedHours152units0v127304253014.5.17.533.00NaNNaNNaN0
1058312021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage annual wages and salariesDollars81units0v127304253114.5.17.6101380.00NaNNaNNaN0
1058322021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage hourly wageDollars81units0v127304253214.5.17.759.98NaNNaNNaN2
1058332021Nunavut2016A000262Government non-profit institutions65 years old and overNumber of jobsJobs190units0v127304262414.5.18.127.00NaNNaNNaN0
1058342021Nunavut2016A000262Government non-profit institutions65 years old and overHours workedHours152thousands3v127304262514.5.18.230.00NaNNaNNaN0
1058352021Nunavut2016A000262Government non-profit institutions65 years old and overWages and salariesDollars81millions6v127304262614.5.18.32.00NaNNaNNaN0
1058362021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual hours workedHours152units0v127304262714.5.18.41111.00NaNNaNNaN0
1058372021Nunavut2016A000262Government non-profit institutions65 years old and overAverage weekly hours workedHours152units0v127304262814.5.18.521.00NaNNaNNaN0
1058382021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual wages and salariesDollars81units0v127304262914.5.18.674037.00NaNNaNNaN0
1058392021Nunavut2016A000262Government non-profit institutions65 years old and overAverage hourly wageDollars81units0v127304263014.5.18.766.63NaNNaNNaN2
Pandas Profiling Report

Overview

Dataset statistics

Number of variables17
Number of observations105840
Missing cells317520
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory136.0 B

Variable types

Numeric2
Categorical11
Text2
Unsupported2

Alerts

STATUS has constant value ""Constant
GEO is highly overall correlated with DGUIDHigh correlation
DGUID is highly overall correlated with GEOHigh correlation
Indicators is highly overall correlated with UOM and 4 other fieldsHigh correlation
UOM is highly overall correlated with Indicators and 1 other fieldsHigh correlation
UOM_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_FACTOR is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
DECIMALS is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
STATUS has 102816 (97.1%) missing valuesMissing
SYMBOL has 105840 (100.0%) missing valuesMissing
TERMINATED has 105840 (100.0%) missing valuesMissing
GEO is uniformly distributedUniform
DGUID is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform
SYMBOL is an unsupported type, check if it needs cleaning or further analysisUnsupported
TERMINATED is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-11-28 20:51:40.644089
Analysis finished2023-11-28 20:51:48.610309
Duration7.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-28T15:51:48.745747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-11-28T15:51:48.963901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-28T15:51:49.227250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-28T15:51:49.491386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-11-28T15:51:49.758881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:50.011056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-11-28T15:51:50.331488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-11-28T15:51:50.638851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:50.874215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-11-28T15:51:51.171813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:51.373610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

UOM_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
152
45360 
81
45360 
190
15120 

Length

Max length3
Median length3
Mean length2.5714286
Min length2

Characters and Unicode

Total characters272160
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row190
2nd row152
3rd row81
4th row152
5th row152

Common Values

ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Length

2023-11-28T15:51:51.592665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:51.917642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 272160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 272160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-11-28T15:51:52.141220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:52.375394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

SCALAR_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
75600 
3
15120 
6
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row6
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Length

2023-11-28T15:51:52.606902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:52.799975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

VECTOR
Text

Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-11-28T15:51:53.118342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1164240
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowv1273033811
2nd rowv1273033812
3rd rowv1273033813
4th rowv1273033814
5th rowv1273033815
ValueCountFrequency (%)
v1273033811 12
 
< 0.1%
v1273033912 12
 
< 0.1%
v1273034009 12
 
< 0.1%
v1273034008 12
 
< 0.1%
v1273034007 12
 
< 0.1%
v1273033915 12
 
< 0.1%
v1273033914 12
 
< 0.1%
v1273033913 12
 
< 0.1%
v1273033911 12
 
< 0.1%
v1273034698 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-11-28T15:51:53.685983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1058400
90.9%
Lowercase Letter 105840
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
v 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1058400
90.9%
Latin 105840
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Latin
ValueCountFrequency (%)
v 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1164240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%
Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-11-28T15:51:54.122878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8.5
Mean length7.8571429
Min length7

Characters and Unicode

Total characters831600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 12
 
< 0.1%
1.1.2.4 12
 
< 0.1%
1.1.3.3 12
 
< 0.1%
1.1.3.2 12
 
< 0.1%
1.1.3.1 12
 
< 0.1%
1.1.2.7 12
 
< 0.1%
1.1.2.6 12
 
< 0.1%
1.1.2.5 12
 
< 0.1%
1.1.2.3 12
 
< 0.1%
1.1.10.6 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-11-28T15:51:54.902254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 514080
61.8%
Other Punctuation 317520
38.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 153888
29.9%
2 63168
12.3%
3 63168
12.3%
4 63168
12.3%
5 55608
 
10.8%
6 34440
 
6.7%
7 34440
 
6.7%
8 19320
 
3.8%
0 13440
 
2.6%
9 13440
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 317520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 831600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 831600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-28T15:51:55.193949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-11-28T15:51:55.508912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

STATUS
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing102816
Missing (%)97.1%
Memory size827.0 KiB
x
3024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3024
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowx
2nd rowx
3rd rowx
4th rowx
5th rowx

Common Values

ValueCountFrequency (%)
x 3024
 
2.9%
(Missing) 102816
97.1%

Length

2023-11-28T15:51:55.815868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:56.004363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 3024
100.0%

Most occurring characters

ValueCountFrequency (%)
x 3024
100.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3024
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 3024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 3024
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
x 3024
100.0%

SYMBOL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

TERMINATED
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

DECIMALS
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
90720 
2
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Length

2023-11-28T15:51:56.190207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:51:56.366815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Interactions

2023-11-28T15:51:46.698052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:51:46.144244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:51:46.914252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:51:46.449127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-28T15:51:56.529220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDDECIMALS
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.0810.1050.1050.044
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
UOM_ID0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
SCALAR_ID0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
DECIMALS0.0000.0440.0000.0000.0000.0001.0000.4710.4710.2580.2581.000

Missing values

2023-11-28T15:51:47.276674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-28T15:51:47.861741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-28T15:51:48.387634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
02010Canada2016A000011124Total non-profit institutionsMale employeesNumber of jobsJobs190units0v12730338111.1.1.1642584.00NaNNaNNaN0
12010Canada2016A000011124Total non-profit institutionsMale employeesHours workedHours152thousands3v12730338121.1.1.21048516.00NaNNaNNaN0
22010Canada2016A000011124Total non-profit institutionsMale employeesWages and salariesDollars81millions6v12730338131.1.1.330805.00NaNNaNNaN0
32010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual hours workedHours152units0v12730338141.1.1.41632.00NaNNaNNaN0
42010Canada2016A000011124Total non-profit institutionsMale employeesAverage weekly hours workedHours152units0v12730338151.1.1.531.00NaNNaNNaN0
52010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual wages and salariesDollars81units0v12730338161.1.1.647940.00NaNNaNNaN0
62010Canada2016A000011124Total non-profit institutionsMale employeesAverage hourly wageDollars81units0v12730338171.1.1.729.38NaNNaNNaN2
72010Canada2016A000011124Total non-profit institutionsFemale employeesNumber of jobsJobs190units0v12730339091.1.2.11500394.00NaNNaNNaN0
82010Canada2016A000011124Total non-profit institutionsFemale employeesHours workedHours152thousands3v12730339101.1.2.22331018.00NaNNaNNaN0
92010Canada2016A000011124Total non-profit institutionsFemale employeesWages and salariesDollars81millions6v12730339111.1.2.360943.00NaNNaNNaN0
REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
1058302021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage weekly hours workedHours152units0v127304253014.5.17.533.00NaNNaNNaN0
1058312021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage annual wages and salariesDollars81units0v127304253114.5.17.6101380.00NaNNaNNaN0
1058322021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage hourly wageDollars81units0v127304253214.5.17.759.98NaNNaNNaN2
1058332021Nunavut2016A000262Government non-profit institutions65 years old and overNumber of jobsJobs190units0v127304262414.5.18.127.00NaNNaNNaN0
1058342021Nunavut2016A000262Government non-profit institutions65 years old and overHours workedHours152thousands3v127304262514.5.18.230.00NaNNaNNaN0
1058352021Nunavut2016A000262Government non-profit institutions65 years old and overWages and salariesDollars81millions6v127304262614.5.18.32.00NaNNaNNaN0
1058362021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual hours workedHours152units0v127304262714.5.18.41111.00NaNNaNNaN0
1058372021Nunavut2016A000262Government non-profit institutions65 years old and overAverage weekly hours workedHours152units0v127304262814.5.18.521.00NaNNaNNaN0
1058382021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual wages and salariesDollars81units0v127304262914.5.18.674037.00NaNNaNNaN0
1058392021Nunavut2016A000262Government non-profit institutions65 years old and overAverage hourly wageDollars81units0v127304263014.5.18.766.63NaNNaNNaN2
Pandas Profiling Report

Overview

Dataset statistics

Number of variables17
Number of observations105840
Missing cells317520
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory136.0 B

Variable types

Numeric2
Categorical11
Text2
Unsupported2

Alerts

STATUS has constant value ""Constant
GEO is highly overall correlated with DGUIDHigh correlation
DGUID is highly overall correlated with GEOHigh correlation
Indicators is highly overall correlated with UOM and 4 other fieldsHigh correlation
UOM is highly overall correlated with Indicators and 1 other fieldsHigh correlation
UOM_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_FACTOR is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
DECIMALS is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
STATUS has 102816 (97.1%) missing valuesMissing
SYMBOL has 105840 (100.0%) missing valuesMissing
TERMINATED has 105840 (100.0%) missing valuesMissing
GEO is uniformly distributedUniform
DGUID is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform
SYMBOL is an unsupported type, check if it needs cleaning or further analysisUnsupported
TERMINATED is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-20 15:26:18.100439
Analysis finished2023-12-20 15:26:25.640193
Duration7.54 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:26:25.894944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T10:26:26.103124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:26:26.366474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:26:26.623210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T10:26:26.872529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:27.097644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T10:26:27.365442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T10:26:27.595351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:27.812275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T10:26:28.100278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:28.291325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

UOM_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
152
45360 
81
45360 
190
15120 

Length

Max length3
Median length3
Mean length2.5714286
Min length2

Characters and Unicode

Total characters272160
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row190
2nd row152
3rd row81
4th row152
5th row152

Common Values

ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Length

2023-12-20T10:26:28.521479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:28.711874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 272160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 272160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T10:26:29.119274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:29.340410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

SCALAR_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
75600 
3
15120 
6
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row6
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Length

2023-12-20T10:26:29.559076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:29.739030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

VECTOR
Text

Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T10:26:30.059639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1164240
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowv1273033811
2nd rowv1273033812
3rd rowv1273033813
4th rowv1273033814
5th rowv1273033815
ValueCountFrequency (%)
v1273033811 12
 
< 0.1%
v1273033912 12
 
< 0.1%
v1273034009 12
 
< 0.1%
v1273034008 12
 
< 0.1%
v1273034007 12
 
< 0.1%
v1273033915 12
 
< 0.1%
v1273033914 12
 
< 0.1%
v1273033913 12
 
< 0.1%
v1273033911 12
 
< 0.1%
v1273034698 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T10:26:30.587151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1058400
90.9%
Lowercase Letter 105840
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
v 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1058400
90.9%
Latin 105840
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Latin
ValueCountFrequency (%)
v 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1164240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%
Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T10:26:31.010011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8.5
Mean length7.8571429
Min length7

Characters and Unicode

Total characters831600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 12
 
< 0.1%
1.1.2.4 12
 
< 0.1%
1.1.3.3 12
 
< 0.1%
1.1.3.2 12
 
< 0.1%
1.1.3.1 12
 
< 0.1%
1.1.2.7 12
 
< 0.1%
1.1.2.6 12
 
< 0.1%
1.1.2.5 12
 
< 0.1%
1.1.2.3 12
 
< 0.1%
1.1.10.6 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T10:26:31.715185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 514080
61.8%
Other Punctuation 317520
38.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 153888
29.9%
2 63168
12.3%
3 63168
12.3%
4 63168
12.3%
5 55608
 
10.8%
6 34440
 
6.7%
7 34440
 
6.7%
8 19320
 
3.8%
0 13440
 
2.6%
9 13440
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 317520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 831600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 831600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:26:31.995126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:26:32.289571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

STATUS
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing102816
Missing (%)97.1%
Memory size827.0 KiB
x
3024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3024
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowx
2nd rowx
3rd rowx
4th rowx
5th rowx

Common Values

ValueCountFrequency (%)
x 3024
 
2.9%
(Missing) 102816
97.1%

Length

2023-12-20T10:26:32.558838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:32.725188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 3024
100.0%

Most occurring characters

ValueCountFrequency (%)
x 3024
100.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3024
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 3024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 3024
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
x 3024
100.0%

SYMBOL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

TERMINATED
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

DECIMALS
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
90720 
2
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Length

2023-12-20T10:26:32.905060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:33.089780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Interactions

2023-12-20T10:26:23.810058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:23.390340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:24.011085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:23.597604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:26:33.225225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDDECIMALS
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.0810.1050.1050.044
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
UOM_ID0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
SCALAR_ID0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
DECIMALS0.0000.0440.0000.0000.0000.0001.0000.4710.4710.2580.2581.000

Missing values

2023-12-20T10:26:24.361675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:26:24.912141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-20T10:26:25.425388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
02010Canada2016A000011124Total non-profit institutionsMale employeesNumber of jobsJobs190units0v12730338111.1.1.1642584.00NaNNaNNaN0
12010Canada2016A000011124Total non-profit institutionsMale employeesHours workedHours152thousands3v12730338121.1.1.21048516.00NaNNaNNaN0
22010Canada2016A000011124Total non-profit institutionsMale employeesWages and salariesDollars81millions6v12730338131.1.1.330805.00NaNNaNNaN0
32010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual hours workedHours152units0v12730338141.1.1.41632.00NaNNaNNaN0
42010Canada2016A000011124Total non-profit institutionsMale employeesAverage weekly hours workedHours152units0v12730338151.1.1.531.00NaNNaNNaN0
52010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual wages and salariesDollars81units0v12730338161.1.1.647940.00NaNNaNNaN0
62010Canada2016A000011124Total non-profit institutionsMale employeesAverage hourly wageDollars81units0v12730338171.1.1.729.38NaNNaNNaN2
72010Canada2016A000011124Total non-profit institutionsFemale employeesNumber of jobsJobs190units0v12730339091.1.2.11500394.00NaNNaNNaN0
82010Canada2016A000011124Total non-profit institutionsFemale employeesHours workedHours152thousands3v12730339101.1.2.22331018.00NaNNaNNaN0
92010Canada2016A000011124Total non-profit institutionsFemale employeesWages and salariesDollars81millions6v12730339111.1.2.360943.00NaNNaNNaN0
REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
1058302021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage weekly hours workedHours152units0v127304253014.5.17.533.00NaNNaNNaN0
1058312021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage annual wages and salariesDollars81units0v127304253114.5.17.6101380.00NaNNaNNaN0
1058322021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage hourly wageDollars81units0v127304253214.5.17.759.98NaNNaNNaN2
1058332021Nunavut2016A000262Government non-profit institutions65 years old and overNumber of jobsJobs190units0v127304262414.5.18.127.00NaNNaNNaN0
1058342021Nunavut2016A000262Government non-profit institutions65 years old and overHours workedHours152thousands3v127304262514.5.18.230.00NaNNaNNaN0
1058352021Nunavut2016A000262Government non-profit institutions65 years old and overWages and salariesDollars81millions6v127304262614.5.18.32.00NaNNaNNaN0
1058362021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual hours workedHours152units0v127304262714.5.18.41111.00NaNNaNNaN0
1058372021Nunavut2016A000262Government non-profit institutions65 years old and overAverage weekly hours workedHours152units0v127304262814.5.18.521.00NaNNaNNaN0
1058382021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual wages and salariesDollars81units0v127304262914.5.18.674037.00NaNNaNNaN0
1058392021Nunavut2016A000262Government non-profit institutions65 years old and overAverage hourly wageDollars81units0v127304263014.5.18.766.63NaNNaNNaN2
Pandas Profiling Report

Overview

Dataset statistics

Number of variables17
Number of observations105840
Missing cells317520
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory136.0 B

Variable types

Numeric2
Categorical11
Text2
Unsupported2

Alerts

STATUS has constant value ""Constant
GEO is highly overall correlated with DGUIDHigh correlation
DGUID is highly overall correlated with GEOHigh correlation
Indicators is highly overall correlated with UOM and 4 other fieldsHigh correlation
UOM is highly overall correlated with Indicators and 1 other fieldsHigh correlation
UOM_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_FACTOR is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
DECIMALS is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
STATUS has 102816 (97.1%) missing valuesMissing
SYMBOL has 105840 (100.0%) missing valuesMissing
TERMINATED has 105840 (100.0%) missing valuesMissing
GEO is uniformly distributedUniform
DGUID is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform
SYMBOL is an unsupported type, check if it needs cleaning or further analysisUnsupported
TERMINATED is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-20 15:44:57.530353
Analysis finished2023-12-20 15:45:07.048776
Duration9.52 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:45:07.230414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T10:45:07.474605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:45:07.770483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:45:08.053537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T10:45:08.322715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:08.564850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T10:45:08.868447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T10:45:09.118763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:09.342885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T10:45:09.834551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:10.036733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

UOM_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
152
45360 
81
45360 
190
15120 

Length

Max length3
Median length3
Mean length2.5714286
Min length2

Characters and Unicode

Total characters272160
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row190
2nd row152
3rd row81
4th row152
5th row152

Common Values

ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Length

2023-12-20T10:45:10.286457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:10.500628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 272160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 272160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T10:45:10.740745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:10.947090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

SCALAR_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
75600 
3
15120 
6
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row6
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Length

2023-12-20T10:45:11.161507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:11.341881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

VECTOR
Text

Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T10:45:11.634784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1164240
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowv1273033811
2nd rowv1273033812
3rd rowv1273033813
4th rowv1273033814
5th rowv1273033815
ValueCountFrequency (%)
v1273033811 12
 
< 0.1%
v1273033912 12
 
< 0.1%
v1273034009 12
 
< 0.1%
v1273034008 12
 
< 0.1%
v1273034007 12
 
< 0.1%
v1273033915 12
 
< 0.1%
v1273033914 12
 
< 0.1%
v1273033913 12
 
< 0.1%
v1273033911 12
 
< 0.1%
v1273034698 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T10:45:12.190741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1058400
90.9%
Lowercase Letter 105840
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
v 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1058400
90.9%
Latin 105840
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Latin
ValueCountFrequency (%)
v 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1164240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%
Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T10:45:12.607030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8.5
Mean length7.8571429
Min length7

Characters and Unicode

Total characters831600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 12
 
< 0.1%
1.1.2.4 12
 
< 0.1%
1.1.3.3 12
 
< 0.1%
1.1.3.2 12
 
< 0.1%
1.1.3.1 12
 
< 0.1%
1.1.2.7 12
 
< 0.1%
1.1.2.6 12
 
< 0.1%
1.1.2.5 12
 
< 0.1%
1.1.2.3 12
 
< 0.1%
1.1.10.6 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T10:45:13.283722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 514080
61.8%
Other Punctuation 317520
38.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 153888
29.9%
2 63168
12.3%
3 63168
12.3%
4 63168
12.3%
5 55608
 
10.8%
6 34440
 
6.7%
7 34440
 
6.7%
8 19320
 
3.8%
0 13440
 
2.6%
9 13440
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 317520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 831600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 831600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:45:13.566633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:45:13.866753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

STATUS
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing102816
Missing (%)97.1%
Memory size827.0 KiB
x
3024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3024
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowx
2nd rowx
3rd rowx
4th rowx
5th rowx

Common Values

ValueCountFrequency (%)
x 3024
 
2.9%
(Missing) 102816
97.1%

Length

2023-12-20T10:45:14.170349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:14.345001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 3024
100.0%

Most occurring characters

ValueCountFrequency (%)
x 3024
100.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3024
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 3024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 3024
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
x 3024
100.0%

SYMBOL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

TERMINATED
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

DECIMALS
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
90720 
2
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Length

2023-12-20T10:45:14.534410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:14.711217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Interactions

2023-12-20T10:45:04.777267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:04.237159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:05.074570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:04.510288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:45:14.852301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDDECIMALS
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.0810.1050.1050.044
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
UOM_ID0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
SCALAR_ID0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
DECIMALS0.0000.0440.0000.0000.0000.0001.0000.4710.4710.2580.2581.000

Missing values

2023-12-20T10:45:05.560353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:45:06.251136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-20T10:45:06.802361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
02010Canada2016A000011124Total non-profit institutionsMale employeesNumber of jobsJobs190units0v12730338111.1.1.1642584.00NaNNaNNaN0
12010Canada2016A000011124Total non-profit institutionsMale employeesHours workedHours152thousands3v12730338121.1.1.21048516.00NaNNaNNaN0
22010Canada2016A000011124Total non-profit institutionsMale employeesWages and salariesDollars81millions6v12730338131.1.1.330805.00NaNNaNNaN0
32010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual hours workedHours152units0v12730338141.1.1.41632.00NaNNaNNaN0
42010Canada2016A000011124Total non-profit institutionsMale employeesAverage weekly hours workedHours152units0v12730338151.1.1.531.00NaNNaNNaN0
52010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual wages and salariesDollars81units0v12730338161.1.1.647940.00NaNNaNNaN0
62010Canada2016A000011124Total non-profit institutionsMale employeesAverage hourly wageDollars81units0v12730338171.1.1.729.38NaNNaNNaN2
72010Canada2016A000011124Total non-profit institutionsFemale employeesNumber of jobsJobs190units0v12730339091.1.2.11500394.00NaNNaNNaN0
82010Canada2016A000011124Total non-profit institutionsFemale employeesHours workedHours152thousands3v12730339101.1.2.22331018.00NaNNaNNaN0
92010Canada2016A000011124Total non-profit institutionsFemale employeesWages and salariesDollars81millions6v12730339111.1.2.360943.00NaNNaNNaN0
REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
1058302021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage weekly hours workedHours152units0v127304253014.5.17.533.00NaNNaNNaN0
1058312021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage annual wages and salariesDollars81units0v127304253114.5.17.6101380.00NaNNaNNaN0
1058322021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage hourly wageDollars81units0v127304253214.5.17.759.98NaNNaNNaN2
1058332021Nunavut2016A000262Government non-profit institutions65 years old and overNumber of jobsJobs190units0v127304262414.5.18.127.00NaNNaNNaN0
1058342021Nunavut2016A000262Government non-profit institutions65 years old and overHours workedHours152thousands3v127304262514.5.18.230.00NaNNaNNaN0
1058352021Nunavut2016A000262Government non-profit institutions65 years old and overWages and salariesDollars81millions6v127304262614.5.18.32.00NaNNaNNaN0
1058362021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual hours workedHours152units0v127304262714.5.18.41111.00NaNNaNNaN0
1058372021Nunavut2016A000262Government non-profit institutions65 years old and overAverage weekly hours workedHours152units0v127304262814.5.18.521.00NaNNaNNaN0
1058382021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual wages and salariesDollars81units0v127304262914.5.18.674037.00NaNNaNNaN0
1058392021Nunavut2016A000262Government non-profit institutions65 years old and overAverage hourly wageDollars81units0v127304263014.5.18.766.63NaNNaNNaN2
Pandas Profiling Report

Overview

Dataset statistics

Number of variables17
Number of observations105840
Missing cells317520
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory136.0 B

Variable types

Numeric2
Categorical11
Text2
Unsupported2

Alerts

STATUS has constant value ""Constant
GEO is highly overall correlated with DGUIDHigh correlation
DGUID is highly overall correlated with GEOHigh correlation
Indicators is highly overall correlated with UOM and 4 other fieldsHigh correlation
UOM is highly overall correlated with Indicators and 1 other fieldsHigh correlation
UOM_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_FACTOR is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
DECIMALS is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
STATUS has 102816 (97.1%) missing valuesMissing
SYMBOL has 105840 (100.0%) missing valuesMissing
TERMINATED has 105840 (100.0%) missing valuesMissing
GEO is uniformly distributedUniform
DGUID is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform
SYMBOL is an unsupported type, check if it needs cleaning or further analysisUnsupported
TERMINATED is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-20 15:50:38.150365
Analysis finished2023-12-20 15:50:45.955876
Duration7.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:50:46.067335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T10:50:46.302179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:50:46.610233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:50:46.900756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T10:50:47.159600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:47.423501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T10:50:47.753160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T10:50:48.026788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:48.253586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T10:50:48.577496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:48.809185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

UOM_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
152
45360 
81
45360 
190
15120 

Length

Max length3
Median length3
Mean length2.5714286
Min length2

Characters and Unicode

Total characters272160
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row190
2nd row152
3rd row81
4th row152
5th row152

Common Values

ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Length

2023-12-20T10:50:49.055403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:49.259081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 272160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 272160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T10:50:49.494573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:49.713499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

SCALAR_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
75600 
3
15120 
6
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row6
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Length

2023-12-20T10:50:49.933132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:50.113324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

VECTOR
Text

Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T10:50:50.421827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1164240
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowv1273033811
2nd rowv1273033812
3rd rowv1273033813
4th rowv1273033814
5th rowv1273033815
ValueCountFrequency (%)
v1273033811 12
 
< 0.1%
v1273033912 12
 
< 0.1%
v1273034009 12
 
< 0.1%
v1273034008 12
 
< 0.1%
v1273034007 12
 
< 0.1%
v1273033915 12
 
< 0.1%
v1273033914 12
 
< 0.1%
v1273033913 12
 
< 0.1%
v1273033911 12
 
< 0.1%
v1273034698 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T10:50:51.212771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1058400
90.9%
Lowercase Letter 105840
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
v 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1058400
90.9%
Latin 105840
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Latin
ValueCountFrequency (%)
v 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1164240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%
Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T10:50:51.637131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8.5
Mean length7.8571429
Min length7

Characters and Unicode

Total characters831600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 12
 
< 0.1%
1.1.2.4 12
 
< 0.1%
1.1.3.3 12
 
< 0.1%
1.1.3.2 12
 
< 0.1%
1.1.3.1 12
 
< 0.1%
1.1.2.7 12
 
< 0.1%
1.1.2.6 12
 
< 0.1%
1.1.2.5 12
 
< 0.1%
1.1.2.3 12
 
< 0.1%
1.1.10.6 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T10:50:52.336887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 514080
61.8%
Other Punctuation 317520
38.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 153888
29.9%
2 63168
12.3%
3 63168
12.3%
4 63168
12.3%
5 55608
 
10.8%
6 34440
 
6.7%
7 34440
 
6.7%
8 19320
 
3.8%
0 13440
 
2.6%
9 13440
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 317520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 831600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 831600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:50:52.609384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:50:52.914092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

STATUS
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing102816
Missing (%)97.1%
Memory size827.0 KiB
x
3024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3024
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowx
2nd rowx
3rd rowx
4th rowx
5th rowx

Common Values

ValueCountFrequency (%)
x 3024
 
2.9%
(Missing) 102816
97.1%

Length

2023-12-20T10:50:53.206005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:53.381064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 3024
100.0%

Most occurring characters

ValueCountFrequency (%)
x 3024
100.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3024
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 3024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 3024
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
x 3024
100.0%

SYMBOL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

TERMINATED
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

DECIMALS
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
90720 
2
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Length

2023-12-20T10:50:53.585235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:50:53.766027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Interactions

2023-12-20T10:50:43.853094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:50:43.373481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:50:44.125608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:50:43.615587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:50:53.916382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDDECIMALS
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.0810.1050.1050.044
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
UOM_ID0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
SCALAR_ID0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
DECIMALS0.0000.0440.0000.0000.0000.0001.0000.4710.4710.2580.2581.000

Missing values

2023-12-20T10:50:44.627179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:50:45.251967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-20T10:50:45.743311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
02010Canada2016A000011124Total non-profit institutionsMale employeesNumber of jobsJobs190units0v12730338111.1.1.1642584.00NaNNaNNaN0
12010Canada2016A000011124Total non-profit institutionsMale employeesHours workedHours152thousands3v12730338121.1.1.21048516.00NaNNaNNaN0
22010Canada2016A000011124Total non-profit institutionsMale employeesWages and salariesDollars81millions6v12730338131.1.1.330805.00NaNNaNNaN0
32010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual hours workedHours152units0v12730338141.1.1.41632.00NaNNaNNaN0
42010Canada2016A000011124Total non-profit institutionsMale employeesAverage weekly hours workedHours152units0v12730338151.1.1.531.00NaNNaNNaN0
52010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual wages and salariesDollars81units0v12730338161.1.1.647940.00NaNNaNNaN0
62010Canada2016A000011124Total non-profit institutionsMale employeesAverage hourly wageDollars81units0v12730338171.1.1.729.38NaNNaNNaN2
72010Canada2016A000011124Total non-profit institutionsFemale employeesNumber of jobsJobs190units0v12730339091.1.2.11500394.00NaNNaNNaN0
82010Canada2016A000011124Total non-profit institutionsFemale employeesHours workedHours152thousands3v12730339101.1.2.22331018.00NaNNaNNaN0
92010Canada2016A000011124Total non-profit institutionsFemale employeesWages and salariesDollars81millions6v12730339111.1.2.360943.00NaNNaNNaN0
REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
1058302021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage weekly hours workedHours152units0v127304253014.5.17.533.00NaNNaNNaN0
1058312021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage annual wages and salariesDollars81units0v127304253114.5.17.6101380.00NaNNaNNaN0
1058322021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage hourly wageDollars81units0v127304253214.5.17.759.98NaNNaNNaN2
1058332021Nunavut2016A000262Government non-profit institutions65 years old and overNumber of jobsJobs190units0v127304262414.5.18.127.00NaNNaNNaN0
1058342021Nunavut2016A000262Government non-profit institutions65 years old and overHours workedHours152thousands3v127304262514.5.18.230.00NaNNaNNaN0
1058352021Nunavut2016A000262Government non-profit institutions65 years old and overWages and salariesDollars81millions6v127304262614.5.18.32.00NaNNaNNaN0
1058362021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual hours workedHours152units0v127304262714.5.18.41111.00NaNNaNNaN0
1058372021Nunavut2016A000262Government non-profit institutions65 years old and overAverage weekly hours workedHours152units0v127304262814.5.18.521.00NaNNaNNaN0
1058382021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual wages and salariesDollars81units0v127304262914.5.18.674037.00NaNNaNNaN0
1058392021Nunavut2016A000262Government non-profit institutions65 years old and overAverage hourly wageDollars81units0v127304263014.5.18.766.63NaNNaNNaN2
Pandas Profiling Report

Overview

Dataset statistics

Number of variables17
Number of observations105840
Missing cells317520
Missing cells (%)17.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory136.0 B

Variable types

Numeric2
Categorical11
Text2
Unsupported2

Alerts

STATUS has constant value ""Constant
GEO is highly overall correlated with DGUIDHigh correlation
DGUID is highly overall correlated with GEOHigh correlation
Indicators is highly overall correlated with UOM and 4 other fieldsHigh correlation
UOM is highly overall correlated with Indicators and 1 other fieldsHigh correlation
UOM_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_FACTOR is highly overall correlated with Indicators and 1 other fieldsHigh correlation
SCALAR_ID is highly overall correlated with Indicators and 1 other fieldsHigh correlation
DECIMALS is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
STATUS has 102816 (97.1%) missing valuesMissing
SYMBOL has 105840 (100.0%) missing valuesMissing
TERMINATED has 105840 (100.0%) missing valuesMissing
GEO is uniformly distributedUniform
DGUID is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform
SYMBOL is an unsupported type, check if it needs cleaning or further analysisUnsupported
TERMINATED is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-20 17:51:57.783188
Analysis finished2023-12-20 17:52:07.018656
Duration9.24 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T12:52:07.271139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T12:52:07.475615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T12:52:07.742546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T12:52:08.001609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T12:52:08.236849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:08.492720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T12:52:08.789072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T12:52:09.043067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:09.247694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T12:52:09.500469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:09.727452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

UOM_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
152
45360 
81
45360 
190
15120 

Length

Max length3
Median length3
Mean length2.5714286
Min length2

Characters and Unicode

Total characters272160
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row190
2nd row152
3rd row81
4th row152
5th row152

Common Values

ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Length

2023-12-20T12:52:09.924697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:10.303349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
152 45360
42.9%
81 45360
42.9%
190 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 272160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common 272160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 105840
38.9%
5 45360
16.7%
2 45360
16.7%
8 45360
16.7%
9 15120
 
5.6%
0 15120
 
5.6%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T12:52:10.571142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:10.795238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

SCALAR_ID
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
75600 
3
15120 
6
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row3
3rd row6
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Length

2023-12-20T12:52:10.999300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:11.186178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 75600
71.4%
3 15120
 
14.3%
6 15120
 
14.3%

VECTOR
Text

Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T12:52:11.500458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1164240
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowv1273033811
2nd rowv1273033812
3rd rowv1273033813
4th rowv1273033814
5th rowv1273033815
ValueCountFrequency (%)
v1273033811 12
 
< 0.1%
v1273033912 12
 
< 0.1%
v1273034009 12
 
< 0.1%
v1273034008 12
 
< 0.1%
v1273034007 12
 
< 0.1%
v1273033915 12
 
< 0.1%
v1273033914 12
 
< 0.1%
v1273033913 12
 
< 0.1%
v1273033911 12
 
< 0.1%
v1273034698 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T12:52:12.019572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1058400
90.9%
Lowercase Letter 105840
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
v 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1058400
90.9%
Latin 105840
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 214332
20.3%
1 149892
14.2%
0 149784
14.2%
7 148584
14.0%
2 145476
13.7%
4 75516
 
7.1%
5 43944
 
4.2%
9 43944
 
4.2%
8 43812
 
4.1%
6 43116
 
4.1%
Latin
ValueCountFrequency (%)
v 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1164240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 214332
18.4%
1 149892
12.9%
0 149784
12.9%
7 148584
12.8%
2 145476
12.5%
v 105840
9.1%
4 75516
 
6.5%
5 43944
 
3.8%
9 43944
 
3.8%
8 43812
 
3.8%
Distinct8820
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2023-12-20T12:52:12.445200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8.5
Mean length7.8571429
Min length7

Characters and Unicode

Total characters831600
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.1.1.1
2nd row1.1.1.2
3rd row1.1.1.3
4th row1.1.1.4
5th row1.1.1.5
ValueCountFrequency (%)
1.1.1.1 12
 
< 0.1%
1.1.2.4 12
 
< 0.1%
1.1.3.3 12
 
< 0.1%
1.1.3.2 12
 
< 0.1%
1.1.3.1 12
 
< 0.1%
1.1.2.7 12
 
< 0.1%
1.1.2.6 12
 
< 0.1%
1.1.2.5 12
 
< 0.1%
1.1.2.3 12
 
< 0.1%
1.1.10.6 12
 
< 0.1%
Other values (8810) 105720
99.9%
2023-12-20T12:52:13.124212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 514080
61.8%
Other Punctuation 317520
38.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 153888
29.9%
2 63168
12.3%
3 63168
12.3%
4 63168
12.3%
5 55608
 
10.8%
6 34440
 
6.7%
7 34440
 
6.7%
8 19320
 
3.8%
0 13440
 
2.6%
9 13440
 
2.6%
Other Punctuation
ValueCountFrequency (%)
. 317520
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 831600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 831600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 317520
38.2%
1 153888
18.5%
2 63168
 
7.6%
3 63168
 
7.6%
4 63168
 
7.6%
5 55608
 
6.7%
6 34440
 
4.1%
7 34440
 
4.1%
8 19320
 
2.3%
0 13440
 
1.6%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T12:52:13.408453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T12:52:13.691439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

STATUS
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing102816
Missing (%)97.1%
Memory size827.0 KiB
x
3024 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3024
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowx
2nd rowx
3rd rowx
4th rowx
5th rowx

Common Values

ValueCountFrequency (%)
x 3024
 
2.9%
(Missing) 102816
97.1%

Length

2023-12-20T12:52:13.992234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:14.231895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 3024
100.0%

Most occurring characters

ValueCountFrequency (%)
x 3024
100.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3024
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
x 3024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3024
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
x 3024
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3024
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
x 3024
100.0%

SYMBOL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

TERMINATED
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing105840
Missing (%)100.0%
Memory size827.0 KiB

DECIMALS
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
0
90720 
2
15120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105840
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Length

2023-12-20T12:52:14.417519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:14.666632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 105840
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Common 105840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90720
85.7%
2 15120
 
14.3%

Interactions

2023-12-20T12:52:05.127758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:04.687117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:05.324525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:04.937996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T12:52:14.811539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDDECIMALS
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.0810.1050.1050.044
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
UOM_ID0.0000.0810.0000.0000.0000.0001.0001.0001.0000.4470.4470.471
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
SCALAR_ID0.0000.1050.0000.0000.0000.0001.0000.4470.4471.0001.0000.258
DECIMALS0.0000.0440.0000.0000.0000.0001.0000.4710.4710.2580.2581.000

Missing values

2023-12-20T12:52:05.679969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T12:52:06.265821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-20T12:52:06.824582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
02010Canada2016A000011124Total non-profit institutionsMale employeesNumber of jobsJobs190units0v12730338111.1.1.1642584.00NaNNaNNaN0
12010Canada2016A000011124Total non-profit institutionsMale employeesHours workedHours152thousands3v12730338121.1.1.21048516.00NaNNaNNaN0
22010Canada2016A000011124Total non-profit institutionsMale employeesWages and salariesDollars81millions6v12730338131.1.1.330805.00NaNNaNNaN0
32010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual hours workedHours152units0v12730338141.1.1.41632.00NaNNaNNaN0
42010Canada2016A000011124Total non-profit institutionsMale employeesAverage weekly hours workedHours152units0v12730338151.1.1.531.00NaNNaNNaN0
52010Canada2016A000011124Total non-profit institutionsMale employeesAverage annual wages and salariesDollars81units0v12730338161.1.1.647940.00NaNNaNNaN0
62010Canada2016A000011124Total non-profit institutionsMale employeesAverage hourly wageDollars81units0v12730338171.1.1.729.38NaNNaNNaN2
72010Canada2016A000011124Total non-profit institutionsFemale employeesNumber of jobsJobs190units0v12730339091.1.2.11500394.00NaNNaNNaN0
82010Canada2016A000011124Total non-profit institutionsFemale employeesHours workedHours152thousands3v12730339101.1.2.22331018.00NaNNaNNaN0
92010Canada2016A000011124Total non-profit institutionsFemale employeesWages and salariesDollars81millions6v12730339111.1.2.360943.00NaNNaNNaN0
REF_DATEGEODGUIDSectorCharacteristicsIndicatorsUOMUOM_IDSCALAR_FACTORSCALAR_IDVECTORCOORDINATEVALUESTATUSSYMBOLTERMINATEDDECIMALS
1058302021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage weekly hours workedHours152units0v127304253014.5.17.533.00NaNNaNNaN0
1058312021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage annual wages and salariesDollars81units0v127304253114.5.17.6101380.00NaNNaNNaN0
1058322021Nunavut2016A000262Government non-profit institutions55 to 64 yearsAverage hourly wageDollars81units0v127304253214.5.17.759.98NaNNaNNaN2
1058332021Nunavut2016A000262Government non-profit institutions65 years old and overNumber of jobsJobs190units0v127304262414.5.18.127.00NaNNaNNaN0
1058342021Nunavut2016A000262Government non-profit institutions65 years old and overHours workedHours152thousands3v127304262514.5.18.230.00NaNNaNNaN0
1058352021Nunavut2016A000262Government non-profit institutions65 years old and overWages and salariesDollars81millions6v127304262614.5.18.32.00NaNNaNNaN0
1058362021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual hours workedHours152units0v127304262714.5.18.41111.00NaNNaNNaN0
1058372021Nunavut2016A000262Government non-profit institutions65 years old and overAverage weekly hours workedHours152units0v127304262814.5.18.521.00NaNNaNNaN0
1058382021Nunavut2016A000262Government non-profit institutions65 years old and overAverage annual wages and salariesDollars81units0v127304262914.5.18.674037.00NaNNaNNaN0
1058392021Nunavut2016A000262Government non-profit institutions65 years old and overAverage hourly wageDollars81units0v127304263014.5.18.766.63NaNNaNNaN2